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"ti ■ This note compares the single-shot and intensity cross-correlation 

'"O ! proposals for x-ray imaging of randomly oriented particles and 

c/2 ■ shows very directly that the latter will usually not be feasible even 

. y . when the former is. 
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,£>-! • 1 Introduction 

> 

^ ■ The health and growth of any field of science can often be traced to a few, singularly creative and visionary 

individuals whose contributions are pervasive. Such is the case in the relatively new field of coherent imag- 
ing, as the application of even an unsophisticated "common-element" algorithm to author lists will confirm 
[1-16]. Scientists in this class succeed by not being overly inhibited by the relentless filters of practicality, 
plausibility, and even feasibility. This task falls to a lower rank of inquiry, and it is this function that the 

O ! present contribution is meant to serve. 

This note considers the problem of imaging identical, randomly oriented particles with diffraction sig- 
nals. We first review two imaging schemes that would appear to be quite different in their experimental 
requirements. This is shown to be false: the feasibility of both methods depends critically on the mean 
number of photons scattered per particle and one of the schemes is superior in this respect. 

Our discussion only considers the reconstruction of the 3D intensity distribution of the particle. The 
easier problem of reconstructing the particle contrast from the intensity, i.e. reconstructing the phase, is 
treated at length elsewhere lTP71[l"8ll . 



2 Two data collection schemes 



X-ray free-electron lasers offer the possibility of imaging individual particles that currently have to be crys- 
tallized in order to produce sufficient signal [ 19]. The increase in brightness over synchrotron sources is so 
enormous that the best way of utilizing this new resource is still wide open. We will discuss two methods 
that have been proposed. Common to both methods is a train of short (10 fs) pulses, each comprising a fixed 
number of photons (10 12 ). The methods differ in how the pulses are applied to the particles — of which we 
have an unlimited supply — and how the data is collected. 



2.1 Single-shot data 

In the single-shot scheme lfl9l l20l the pulses are focussed to a small area A ss , such that each pulse hits 
at most one particle. On average, each hit elastically scatters N ss photons into the range of angles useful 
for structure determination. The data collected in this scheme are sets of photon counts K p {(\i), where p 
identifies the hit (pulse) and q; is the spatial frequency associated with photons scattered into detector pixel 
i. The time between pulses (10 ms) is assumed to be sufficient for detector read-out and initialization for 
the next pulse. Although the particles are identical in structure (at the resolution of interest), the particle 
orientation in each hit is random and unknown. The duration of the pulses is short enough that rotational 
motion and particle explosion (due to inelastic processes) have a negligible effect on the elastically scattered 
photons. 

The goal of the single-shot scheme is to determine the unique 3D intensity distribution J(q) of a single 
particle that has the highest probability of "explaining" all of the data K p (qi) when the particle in each hit 
is assigned a suitable rotation R p and the corresponding photon fluence <f) p a particular value. To determine 
I(q) one maximizes the log-likelihood function ||2T1 

J2 log P ({K P (m)} i= i,... | J(q), R P , (f> P ) (1) 

p 

where P(. ..[...) is the conditional probability, based on Poisson statistics, of the data from pulse p given 
the model 7(q) and parameters R p , <p p . This intensity reconstruction problem is already interesting when we 
do not limit the number of hits. In this case it is only the characteristics of the incident pulse, or equivalently 
the average number of scattered photons N ss , that can effect the feasibility of the method. We provide the 
same resources — unlimited supply of particles and x-ray pulses — to the competing scheme (next section). 



2.2 Cross-correlation data 



The alternative scheme is based on intensity cross-correlations 11221 1231 . Here the same pulses are focussed 
to a much larger area A cc , where they scatter from an ensemble of many particleqj which we again assume 
are identical except for orientation which is random and unknown. The average number of photons scattered 
per particle, relative to the single-shot method, is reduced by the ratio of areas: N cc = (A SS /A CC )N SS . 
We assume the focus area A cc is large relative to the transverse coherence length of the pulse so that the 



1 As originally proposed |22| the particles would be in solution; in free-electron laser experiments the particle ensemble would 
be prepared in a vacuum. 



diffraction signals of the particles combine incoherently. The raw data in this method are the photon counts 
K p (qi) and products K p (qj)Kp(qj), recorded by the detector after every pulse p scatters from the particle 
ensemble. Unlike the single-shot scheme, which records data for every pulse (hit), the raw data here are 
averaged over pulses to obtain the cross-correlations 

C(qi>qj) = (K p (qi)K p (qj))p - (K p {cn)) p (K p (qj)) p . (2) 

As in the single-shot scheme we assume the number of pulses and number of particles (in the ensemble) is 
unlimited so that the errors in these averages can be suitably small. Thus the feasibility of the method can 
only depend on the average number of photons scattered per particle, N cc . 

The goal of the cross-correlation scheme is the same as for the single-shot scheme: to determine the 
unique 3D intensity /(q) of a single particle that is consistent with the data, in this case the cross-correlations. 
This translates to the set of equations 

C(q, q') = </(R • q)/(R • q')>R - </(R ■ q)>R</(R ■ q')>R, (3) 

for the function i"(q) at all pairs of measured spatial frequencies q and q'. The averages in this equation 
are with respect to the rotations R of the particle axes relative to the incident beam. In the absence of any 
particle alignment mechanism the distribution of R is uniform. 

The cross-correlation scheme has several practical advantages over the single-shot scheme. Since the 
diffraction signal recorded from each pulse is proportional to the number of particles in the ensemble, the 
signal-to-noise in the raw data will be much higher. Not having to focus the beam to a small area and 
being able to simply average the data over pulses (rather than storing each one) also greatly simplifies the 
experiment. But these practical considerations are irrelevant in the event that the intensity J(q) cannot be 
reconstructed at all. The theoretical feasibility of the reconstruction should be the foremost criterion when 
evaluating the two schemes. 



3 Comparison of the two schemes 



As a first step in comparing the two schemes we show that the single-shot data could be processed in the 
manner of the cross-correlation data but not conversely. Moreover, since the number of photons scattered 
per particle is greater in the single-shot scheme, in the absence of practical factors this is the superior method 
for collecting data. 

Because the photons scattered by the particle ensemble combine incoherently, we can express the counts 
recorded by the detector from pulse p in the form 

K p {cu) = ^K pa {cu), (4) 

a 

where the index a identifies particles in the ensemble. If there were an oracld_| that somehow was able to 
assign particle-origin identifiers a to all the detected photons, then the added information would transform 
the data in a cross-correlation experiment into data that would be collected in a single-shot experiment, 



2 Oracles are often used in theoretical computer science to isolate particular parts of a problem. It is in this sense that the oracle 
is used in present context. 



although with a large reduction in the number of scattered photons per particle (N cc vs. N ss ). Such an 
oracle, however, does not exist, and so the cross-correlation scheme is always subject to a large information 
deficit relative to the single-shot scheme. 

We can develop the relationship between the two schemes further. Since the orientations of two particles 
in the ensemble are uncorrected, the pulse average for distinct particles a ^ (3 vanishes: 

= (K pa (qi)K p p(qj)) p - (K pa (qi)) p {K P j3(qj)) p (5) 

The cross-correlations (|2]) therefore take the form 

C(qi,qj) = J2 «- K iw(q»)-Kjw(qi))p - (^(<fc))p(Kp«(qj))p) • (6) 

a 

This equation shows that the data processed by the cross-correlation scheme can be thought of as arising 
from single particle hits, where the number of pulses is simply multiplied by the number of particles in 
the ensemble (sum over a). From the viewpoint of the data that gets processed, the cross-correlation ex- 
periments thus look like single-shot experiments. What really distinguishes the two methods is (/) how the 
"single-shot" data is processed, and (//) the difference in the number of scattered photons per particle (N ss 
vs. N cc ). 

The cross-correlation scheme has a disadvantage relative to the single-shot scheme both because of re- 
strictions in the processing and the smaller number of photons scattered per particle. These shortcomings are 
not unrelated. To illustrate this point we suppose that we have a particle that scatters on average N ss = 1000 
photons in the single-shot scheme, and that the linear size of the beam focus in the cross-correlation scheme 
is larger by a factor of 100; thus N cc = (1/100) 2 N SS = 0.1. By Poisson statistics 0.5% of the particles in 
the cross-correlation experiment will scatter two or more photons, the minimum required to contribute to 
the cross-correlation ©. To escape the limitations imposed by quadratic cross-correlations one could also 
form cubic correlations 

C(qi,qj,qk) = (K p (q i )K p (q j )K p (q k )) p (7) 

and higher. But for these correlations only 0.015% of the particles contribute (three or more photons) and 
higher orders decay very rapidly. The small value of N cc thus limits the experiment to low order correlations. 

The question of how many scattered photons (per particle) are required to reconstruct the 3D intensity of 
a particle, from randomly oriented diffraction data, is an interesting one [24]. There are really two questions: 
how many photons are required to provide the necessary information, irrespective of the computational com- 
plexity of the reconstruction; and, how many photons are needed by a practical reconstruction algorithm? 
For the second question we already have some good answers. First, we note that single-shot data could 
be processed in the style of a cross-correlation experiment, by simply averaging products of photon counts 
and forming quadratic, cubic, etc. cross-correlations. However, this data reduction/compression step is not 
done in the most successful, likelihood maximization, algorithms |[25ll2TI . By contrast, the latter type of 
algorithm utilizes all the correlations in the data in the process of iteratively refining a model intensity based 
on comparisons with all the diffraction patterns. The difficulty of these reconstructions, as measured in the 
number of iterations, grows dramatically when the number of detected photons per particle drops below a 
certain value [21 J. This number depends weakly on resolution, and for a particle that measures 16 voxels in 
diameter is about 50 (see Figure 1). These observations combined make us strongly doubt the potential for 
determining the 3D intensity from noisy, randomly oriented diffraction data by the cross-correlation method. 

For the more academic question, regarding information sufficiency of the cross-correlation data, we have 
a partial answer. Consider a 3D intensity that measures D speckles in diameter. The reconstruction of this 
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Figure 1 : Update magnitude vs. iteration in a likelihood maximization reconstruction (EMC algo- 
rithm [21]) of the intensity of a particle. The four plots correspond to different average numbers of 
photons N in the diffraction data, and are labeled by their reduced information rates [24], r(N). The 
latter (a number between zero and one) is the ratio of the information rate of diffraction data in an 
experiment with unknown particle orientations to the rate when particle orientations are known. The 
value r = 1/2 corresponds to the case where the information in each data with unknown orientation 
equals the information it provides about the particle orientation. The selection of photon numbers 
shown, r(25) = 0.42, r(45) = 0.55, r(80) = 0.72 and r(225) = 0.90 emphasizes the sudden onset 
of slow convergence when the reduced information rate falls below 1/2. The corresponding photon 
number, in this case N w 50, is weakly resolution dependent. The data for these reconstructions 
were derived from a particle measuring 16 voxels in its diameter. 



intensity corresponds to the discovery of a unique set of real parameters whose number scales as D 3 . We can 
evaluate the feasibility of this task by finding the scaling of the number of real- valued constraints provided 
by the cross-correlations. Because of rotational averaging, the quadratic cross-correlation function reduces 
to a function of only three real-number arguments: 



C(q,q') = C(M', 



(8) 



Here 9, 9' and locate q and q' on the Ewald sphere in rotation-invariant terms, i.e. two scattering angles 
and an azimuthal angle about the beam axi£j. This shows that the number of constraints also scales as D 3 . 
An answer to the feasibility question thus requires a more detailed analysis. A similar situation arises in 
connection with the standard phase problem [18], i.e. reconstructing particle contrast from autocorrelation 
(Fourier intensity) data. There the number of free variables and constraints also scale with the same power 
of resolution and uniqueness depends on the number of dimensions and the shape of the particle support. In 
the appendix we show that, for the case of a spherical particle, the number of free variables in the contrast 



3 Symmetry considerations apply to the range of the arguments of the reduced function. A detailed analysis can be found in the 
appendix. 



outnumber by a factor of 1.43 the independent constraints provided by the quadratic cross-correlationo A 
similar counting argument, when applied to an ensemble of particles aligned along the incident beam axis, 
shows that quadratic cross-correlations have sufficient information to reconstruct a 2D intensity when only 
one particle angle is unknowro This has been confirmed in simulations [16]. On the other hand, information 
sufficiency for the unaligned case could be achieved with the cubic cross-correlations. Even with rotational 
averaging taken into account, the number of independent cubic correlations scales as D 5 |j. Since photon 
numbers as low as N cc = 1 are adequate for obtaining good cubic cross-correlations, while such low 
numbers are out of reach of likelihood maximization algorithms, it appears that intensity reconstruction 
from cubic cross-correlations is a very difficult problem. 



4 Conclusions 



Whereas the cross-correlation scheme for collecting structure data from randomly oriented particles has 
practical advantages over the single-shot scheme, these advantages do not compensate for the information 
deficit that distinguishes the two schemes. Cross-correlation experiments are an interesting and appropriate 
use of the high photon flux in X-ray free-electron laser sources. However, their use in elucidating structure 
will likely be limited to statistical properties l26l l27l and fall short of complete structure recovery. 



5 Appendix: data insufficiency for spherical particles 

Consider a particle with real density p(r). The support of p(r) is spherical with radius R in units where the 
sampling density, set by the resolution, is 1. The number of density variables to be reconstructed from our 
data is therefore V = (4>n:/3)R 3 . 

We can avoid sampling issues associated with the particle intensity 7(q) — a bandlimited function — by 
working instead with its Fourier transform: the density autocorrelation: 

A(r) = I d 3 r'p(r')p(r' + r). (9) 

The autocorrelation is a real function with the symmetry property 

A(r) = A(-r). (10) 

Since A(r) has a spherical support of radius R = 2R we have 

14vr ~o 

R 3 (11) 

2 3 

independent autocorrelation samples. 

4 This conclusion is stronger than that obtained by Kam [22] and Saldin et al. 1 23 ]. 

5 The scaling of the number of speckles in the 2D intensity, D 2 , is exceeded by the scaling of the number of cross-correlations, 
D 3 . 

6 The reduced function depends on three scattering angles and two azimuthal angles about the incident beam. 



To work with the rotational averages in the data we switch to an angular momentum basis for A(r). 
Writing r = m in terms of its magnitude r and direction n, we have 

l 
A ( r ) = E E A lm {r)Y lm (n) (12) 

1=0,2,4,... m=-l 

where the restriction to even I comes from (flOl ). 

Taking the Fourier transform of (O with respect to each argument we obtain 

C(r, r') = (A(R • r)A(R • r')) R - (A(R • r)) R (A(R • r')) R , (13) 

where the averages are with respect to rotations R of the particle. Expressing A(R ■ r) in terms of the 
angular momentum basis introduces the replacement 

i 
Y lm (n)^ Yl D l mml/ (R)Y lm/l (n) (14) 

m"=—l 

in (fT2)) . where D(H) is the rotation matrix for R in the angular momentum basis. Using the fact that A(r) 
is real and the orthogonality property (averaging in the uniform distribution of R) 

(D mm ,,(R)D m , m m(R)*}^= Sii>5 mm >d m » m iH , (15) 

equation (fT3l takes the form 

-. i i 

C(r,r') = E Wf-T E A lm {r)A$ m {r') ]T Y^^Y^). (16) 

2=2,4,6,... + m=-l m'=-l 

Applying the addition theorem on the sum over m' we obtain the result 

1 l 

C(r,r>,0) = — J2 P l( cosd ) E A im(r)At m (r'), (17) 

n 1=2,4,6,... m=-l 

where cos 6 = n • n' and Pi is the Legendre polynomial. 

The next task is to count the number of independent cross-correlation constraints (fTTT ) given that the 
autocorrelation A(r) is sampled at a finite resolution. Our approach is to express ^4(r) in the basis of 
solutions to the mode equations 

- V 2 A k i m {r) = q 2 klm A klm (r) A klm (r) = 0, r > R (18) 

and thereby arrive at a discrete set of equations. As above, / and m are angular momentum quantum num- 
bers; the new index k = 1,2, . . . identifies the radial structure of the mode. By including all modes with the 
property 

Qklm <Q (19) 

for an appropriate Q, our basis for A(r) has a finite, isotropically defined resolution. Applying standard 
mode counting to modes of bounded linear momentum |q| < Q within a spherical box of volume V = 

{4ir/3)R 3 , 

fQ V n 

\ Anq 2 dq = V, (20) 

Jo (27r)- i 

7 



we obtain 

Q = (67T 2 ) 1 / 3 . (21) 

The solutions to the mode equations have the form (fT2l . where the radial functions are expanded in terms of 

spherical Bessel functions: 

k(l) 

Am (r) = ^2 A Hm ji (q k ir). (22) 

fc=i 
We have suppressed the angular momentum index m on the mode eigenvalues q k im since the latter are 
determined by the m-independent radial equation: 



lfd\ 2 1(1 + 1) 



ji(qki r) = qtijlilkl r) jiiquR) = 0. (23) 



r \dr J r 2 

The maximum radial index k{l) is defined as the maximum k that satisfies ( fl9l ). 

For the purpose of counting constraints the only thing we need to know about the modes A k [ m (r) is the 
number of radial modes k(l) for each angular momentum index /. In the limit of many modes (R — > oo) we 
estimate k{l) using the WKB approximation for the spherical Bessel function, 

sin^(r) 

Ji{qu r) ~ Bi , (24) 

r 

valid for r > r^i where r^i = \/l(l + l)/qki ~ l/lki is the turning point. For fixed /, the highest radial 
mode k(l) is the one with the smallest turning point: 

ru>r k {i)i = — = r{l). (25) 

Up to constant turning point corrections, the phase 4>ki{r) of the sine function in (|24l runs through k(l) 
multiples of tt between this smallest turning point and r = R: 



nk(l) ~ <j> m {R) - <t> m {r{l)) = I* \Q 2 -- 2 dr = RQ f(l/RQ) (26) 

Jr(l) V T 



f(x) = v 1 — x 2 — x arccosx. (27) 

The combination L = RQ is identified with the largest possible /, where there is just a single mode with 
smallest turning point r(L) equal to R. 

Taking advantage of mode orthogonality in (fTTT ) we define 

C(k,k',l)= [ ji(q k ir)r 2 dr [ ji(q kn r') r' 2 dr' f P t (cos 6) 2vr sin Odd C(r,r', 6) (28) 
Jo Jo Jo 

and using (l22l obtain the discrete set of constraint equations for the mode amplitudes of the density auto- 
correlation, 

l 

C(k,k',l)=N kkn J2 A ki m AU m , (29) 

m=—l 

where the N kk n are normalization constants: 



Nkk'l = ^— { f R jf(qkl r) r 2 dr) { [* jf(q kn r') r ,2 dr' ) . (30) 



2Z+LWo j ^ mr)r2dr U " ' ! /! " 



Both Kam [22] and Saldin et al. [23 ] obtain equations of the form d29l ) but where the spherical harmonic 
expansion of the density autocorrelation A(r) is replaced by the expansion of the intensity I(q). These 
authors also remark that since the constraint equations for different I are completely decoupled, the recon- 
struction is ambiguous up to arbitrary relative rotations applied to the different principal angular momentum 
(I) components of the intensity (or equivalently, the density autocorrelation). This conclusion, however, is 
based on the premise that the autocorrelation samples are the independent variables of the reconstruction 
problem when in fact the true independent variables are the density samples. In the case of our spherical par- 
ticle, for example, the number of autocorrelation samples exceeds the number of density samples by a factor 
of four. When the constraint equations d29l ) are expressed in terms of the spherical harmonic expansion 
of p(r) using ©, the different principal angular momentum components of p are coupled in the resulting 
quartic equations. 



The main advantage in expressing the constraint equations d29l > in terms of the autocorrelation, rather 
than the intensity, is that it is possible to count the actual number of constraints. Clearly the data d28l ) are 
symmetric with respect to interchanging k and kl . We also know that these indices range between 1 and k(l) 
for I values up to L = RQ and that the symmetry (TTQb restricts us to only even I. Altogether the number of 
independent constraint equations is therefore given by: 

L k(J) k(l) 3 , 

*- E E E 1 ~ s? j( >"<*>* -(§-£)«'■ <31) 

i=2,4,6,... fe=i k'=k *" J ° \y "/ 

Comparing with the number of free variables, V = {tt/Q)B?, we see that the reconstruction problem is 

underconstrained: 

16 
E/V = 1« 0.698. (32) 

07T 
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